-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvclient: fiddle with assertions and rollbacks #48297
Conversation
❌ The GitHub CI (Cockroach) build has failed on 640f4d04. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
640f4d0
to
8a10574
Compare
❌ The GitHub CI (Cockroach) build has failed on 8a105744. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 10 of 10 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, 3 of 3 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @irfansharif, and @tbg)
pkg/kv/txn_external_test.go, line 117 at r5 (raw file):
atomic.StoreInt64(&filterSet, 1) // This test uses a cluster with two nodes. It'll create a transaction
Is this going to be flaky if the lease moves around?
pkg/kv/txn_external_test.go, line 137 at r5 (raw file):
require.NoError(t, txn.Put(ctx, key, "val")) // If the test we want the transaction to be committed and cleaned up,
"If the test we want" sounds wrong.
pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 806 at r5 (raw file):
} func sanityCheckCommittedErr(ctx context.Context, pErr *roachpb.Error, ba roachpb.BatchRequest) {
Give this a comment indicating what it's sanity checking.
We had an assertion that only a TransactionAbortedError can carry an ABORTED transaction. This assertion seems pretty dubious though - we've just seen failed rollbacks trigger it. While that was a bug - and in fact we caught it due to this assertion (see cockroachdb#48245), it seems hard to ensure that rollbacks will not fail in the future. Release note: None
I'll need it in a future commit to check wait for ctx cancelation in the filter. Release note: None
I'll use them in another package in a future commit. Release note: None
Switch this utility method from collecting local traces to collecting distributed traces. The method is mostly used in tests, with a handfull of production uses. I don't think those production uses care. I'll need distributed traces in a future commit, when this function is used indirectly through the CheckPushResult() testing utility. Release note: None
This patch improves the assertion that requests don't get errors informing them that their transaction has been committed from underneath them. There is a semi-legitimate case that's hitting that assertion: a rollback sent after a commit returned an ambiguous result to the client. This happens easily when the commit's ctx times out - the client gets an ambiguous error and is likely to send a rollback attempt afterwards. This rollback attempt might find a transaction record with the COMMITTED status, in which case the assertion was hit. This patch makes the assertion white-list this case. We were already white-listing this specific case in other places. For other, unexpected cases, the assertion (re-)becomes a Fatalf (from an Errorf), since an unepectedly-committed transaction is no joke. This patch also adds a test for the case that was triggering the assertion. Also, it adds tests for another two most troubling cases of rollback-after-ambiguous-commit that we handle incorrectly. Those behaviors are not fixed; the test just reproduces and highlights them. Touches cockroachdb#48301 Touches cockroachdb#48302 Release note: None
8a10574
to
dd208c0
Compare
❌ The GitHub CI (Cockroach) build has failed on dd208c0e. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @nvanbenschoten, and @tbg)
pkg/kv/txn_external_test.go, line 117 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this going to be flaky if the lease moves around?
well yeah, but why would the lease move? It'd better not move; we have plenty of tests like this.
pkg/kv/txn_external_test.go, line 137 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"If the test we want" sounds wrong.
done
pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 806 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this a comment indicating what it's sanity checking.
done
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 802 at r6 (raw file):
// bug. Requests are not supposed to be sent on transactions after they // are committed. log.Errorf(ctx, "transaction unexpectedly committed: %s. ba: %s. txn: %s.", pErr, ba, errTxn)
Sorry for not getting to it earlier, so feel free to ignore this (doubly so because it's nothing to do with this diff): I'm surprised this isn't a log.Fatal given it's a "serious bug".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @nvanbenschoten, and @tbg)
pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 802 at r6 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Sorry for not getting to it earlier, so feel free to ignore this (doubly so because it's nothing to do with this diff): I'm surprised this isn't a log.Fatal given it's a "serious bug".
see what's going in later commits
See individual commits.